Gaussian mixture model based audio coding in a perceptual domain

نویسنده

  • Erik Norvell
چکیده

Gaussian mixture model based vector quantization (GMM-VQ) is a powerful technique for structured vector quantization. This thesis describes the implementation and evaluation of a perceptual audio coder using GMM-VQ. It is combined with a perceptual transform using psychoacoustic preand post-filtering. The coder was tested for low rate mono audio coding (64 kbps) and the results were compared to audio encoded with MPEG-1 layer III (MP3). The evaluation was performed using a MUSHRA test, where it was demonstrated that the proposed coder could achieve results that are comparable with MP3. The coder was tested using both the psychoacoustic filter and a standard whitening LPC-filter to evaluate the difference between them. Furthermore, the benefits of frequency warping in the psychoacoustic filter was tested by comparing it to a filter using linear frequency resolution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gaussian Mixture Model Based Coding of Speech and Audio

The transmission of speech and audio over communication channels has always required speech and audio coders with reasonable search and computational complexity and good performance relative to the corresponding distortion measure. This work introduces a coding scheme which works in a perceptual auditory domain. The input high dimensional frames of audio and speech are transformed to power spec...

متن کامل

Evaluation of Nonlinear Digital Audio Systems Through Transparency Reduction Using the Combined Test Signal

The well known and established methods used for distortion measurements cannot be used for detecting the errors that appear in digital audio systems based on perceptual coding and decoding. A test stimulus has to be used that has the time-varying properties similar to real signals such as speech and music. Furthermore, the procedure of quality evaluation has to include an estimate of the tempor...

متن کامل

Noisy audio speech enhancement using Wiener filters derived from visual speech

The aim of this paper is to use visual speech information to create Wiener filters for audio speech enhancement. Wiener filters require estimates of both clean speech statistics and noisy speech statistics. Noisy speech statistics are obtained from the noisy input audio while obtaining clean speech statistics is more difficult and is a major problem in the creation of Wiener filters for speech ...

متن کامل

Tree and filter optimization for audio compression in a wavelet-based perceptual audio coder

This paper outlines a new perceptual low bit rate audio coding scheme based on adapted wavelet representations. It claims wavelet tree and filter adaptation attending to a perceptual entropy-based method. To achieve such adaptive structure, a periodized wavelet packet transform is performed for each audio frame. After the transform, the encoder employs scalar adaptive quantization, controlled b...

متن کامل

Optimal and Generalized Multiple Descriptions Image Coding Transform in the Wavelet Domain

In this paper we propose a Multiple Description Image Coding(MDIC) scheme to generate two compressed and balanced rates descriptions in the wavelet domain (Daubechies biorthogonal (9, 7) wavelet) using pairwise correlating transform optimal and application method for Generalized Multiple Description Coding (GMDC) to image coding in the wavelet domain. The GMDC produces statistically correlated ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005